A Framework for Database Mining

نویسندگان

  • Himanshu Gupta
  • Iain McLaren
  • Alfred Vella
چکیده

Data mining is an attempt to automatically extract useful information from passive data using various artificial intelligence techniques [2]. Conventional database systems offer little support for data mining applications. At the same time, statistical and machine learning techniques usually perform poorly when applied to large data sets. These twin limitations suggest the development of algorithms which extract useful knowledge from large data sets efficiently and within a reasonable time. These algorithms should be able to operate directly against data held in relational database systems and therefore no longer restrict the size of the data set which can be manipulated [5]. This abstract outlines a framework for implementing these database mining algorithms [3]. Early attempts at database mining are based on a loosely coupled architecture [1], where SQL commands are embedded within a host programming language. The machine learning algorithm retrieves records from the database in a tuple-at-a-time fashion, each time switching control to the database. Recent database mining systems have begun to integrate the machine learning and database components [4]. The machine learning algorithms are pushed into the execution engine of the database. This reduces execution time by avoiding context switching between the database and machine learning algorithm. The potential of this approach is enhanced by the fact that modern database packages allow user defined procedures to be written and stored in the database for execution. To implement these algorithms, sections of code need to be identified from standard algorithms and re-coded as user defined database functions. Efficiently supporting the queries generated during the data mining process causes problems for conventional database systems. This is because their indices are designed to support predictable queries. However, it is impossible to predict the queries generated during data mining. Support for unpredictable queries is achieved by general purpose multi-dimensional indexing techniques found in spatial and advanced database systems. This type of index needs to be available if database mining algorithms are to be successful [6]. Extracting meaningful models from very large sets of data is a complex, and time consuming task. Most conventional artificial intelligence algorithms are unable to support these sizes of database. One technique which has been developed to overcome this problem is the use of incremental learning. This technique generates a model for an initial data set and updates this model as new records are added. Iterative learning is also applicable in the data mining process. This is because models are generated over several iterations. Current database mining systems regenerate the whole model on each iteration. Iterative modelling techniques use the model generated from the previous pass as the starting point for generating the new model. By avoiding repetitive learning at each iteration, a more accurate model is produced in less time with each pass through the process. Finally, to efficiently deploy any generated models in a conventional database, it is important that the model can be converted into the form of an SQL query. This abstract has outlined a framework for database mining encompassing an integrated architecture which can support data mining within a database environment. Furthermore, this framework has introduced some issues which need to be addressed to efficiently perform database mining within this architecture. These issues include the use of incremental and iterative learning techniques on large data sets, and the conversion of the models produced into SQL queries. With this set of requirements, we are investigating approaches to data mining which fit the proposed framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a Recommendation Framework for Tourist by Mining Geo-tag Photos (Case Study Tehran District 6)

With the increasing popularity of sharing media on social networks and facilitating access to location technologies, such as Global Positioning System (GPS), people are more interested to share their own photos and videos. The world wide web users are no longer the sole consumer but they are producers of information also, hence a wealth of information are available on web 2.0 applications. The ...

متن کامل

An Authorization Framework for Database Systems

Today, data plays an essential role in all levels of human life, from personal cell phones to medical, educational, military and government agencies. In such circumstances, the rate of cyber-attacks is also increasing. According to official reports, data breaches exposed 4.1 billion records in the first half of 2019. An information system consists of several components, which one of the most im...

متن کامل

A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a ...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Mining Sequential Trees in a Tree Sequence Database

Tree structures are used extensively in domains such as XML data management, web log analysis, biological computing, and so on. In this paper we introduce the problem of mining frequent sequential trees in a large tree sequence database. We present a framework for mining frequent sequential trees in a so-called tree sequence database. Basically, this framework employs a transformation-based app...

متن کامل

Integrating Inductive and Deductive Reasoning for Database Mining

Database mining is the process of finding previously unknown rules and relations in large databases. Often, several database mining techniques must be used cooperatively in a single application. In this paper we present the Recon database mining framework, which integrates three database mining techniques: rule induction, rule deduction, and data visualization.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997